Inexact Graph Matching: A Case of Study
نویسندگان
چکیده
Inexact graph matching has become an important research area because it is used to find similarities among objects in several real domains such as chemical and biological compounds. Let G and G′ be input labeled graphs, we present an algorithm capable to find a graph S of G, where S is isomorphic to G′ and the corresponding labels between the vertices and edges of S and G′ are not the same (inexact matching). We use a listcode based representation without candidate generation, where a step by step expansion is implemented. The proposed approach is suitable to work with directed and undirected graphs. We conducted a set of experiments in a genome database in order to show the effectiveness of our algorithm. Our experiments show a promissing method to be used with scalable graph matching tools that can be applied to areas such as Machine Learning (ML) and Data Mining (DM). Introduction Graphs are a powerful and flexible knowledge representation used to model simple and complex structured domains (Cook & Holder 1994). The representation power and flexibility is the main advantage of why the graph-based representation model has been adopted by researchers in different areas such as ML and DM (Cook & Holder 1994; Kuramochi & Karypis 2002). An important problem in ML and DM is to find similarities between objects. If we use a graph-based representation, the problem turns into finding similarities between graphs, which includes tasks as exact and inexact matching, where the graph / subgraph isomorphism detection is a critical operation (Cook & Holder 1994). The task is not easy, because the subgraph isomorphism problem is known to be in NP-complete (Michael & David 2003) then, in the worst case, the time to solve the decision problem is exponential, unless P=NP. The exact matching (two graphs are similar if its topology and labeling is identical) is a widespread studied problem, where several works have been developed, each of them with different objectives. For example, Subdue (Cook & Holder 1994) is an algorithm that implements a computationallyconstrained beam search, however the algorithm may not alCopyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. ways find an isomorphism when it does exist (but it is capable to work the inexact problem). Some algorithms reduce the computational complexity by imposing topological restrictions on the input graphs (Luks 1982). There are other subgraph isomorphism projects such as Ullman (Ullman 1976), VF2 (Cordella et al. 2001) and Nauty (Brendan 1981) that are not able to work with labeled graphs, because many of them are oriented only to solve mathematical problems, and do not consider other classes of problems where labels represent important information. There are other works that explore ideas where the completeness is not sacrificed. AGM (Inokuchi & Washio 2003), FSG (Kuramochi & Karypis 2002), gSpan (Xifeng & Han 2002) and SI-COBRA (Olmos, Gonzalez, & Osorio 2005) are some algorithms that make use of strategies and representations with the aims to reduce the number of operations to perform and then being more efficient. On the other hand, inexact graph matching is an important graph-theoretical problem, because it is used to find inexact similarities between objects. The inexact matching task consists on finding a distortion or variation between two input graphs, where there may not exist an exact match (Cook & Holder 1994; Hlaoui & Wang 2002; Cordella et al. 1996). Throughout this work, we consider an inexact match between two graphs in the sense that they have an identical topology (there exists a bijection between the vertices and edges of the graphs), nevertheless the labels of the vertices and edges might not be the same. In this work, we present an algorithm capable of finding a graph S of G, where S is isomorphic to G′ and the corresponding labels between the vertices and edges of S and G′ are not the same (inexact matching). We use a list-code based representation without candidate generation, where a step by step expansion with an exploration in depth is implemented. The proposed approach is suitable to work with directed and undirected graphs. We conducted a set of experiments in genome databases in collaboration with biology experts in order to study the effectiveness of our method (our method has already been applied to other theoretical and practical domains). Our experimental results show a promissing method to be used with scalable graph matching tools that can be applied to research areas such as Machine Learning (ML) and Data Mining (DM).
منابع مشابه
The Role of Higher-Order Constructs in the Inexact Matching of Semantic Graphs
Inexact pattern matching using semantic graphs has a wideranging use in AI systems, particularly in machine vision, case-based reasoning, and, recently, in intelligence analysis applications. While much previous work in the area has focused on matching simple flat graphs, there is increasing need for and use of complex graphical patterns with higher-order constructs—hierarchical graphs, cardina...
متن کاملON THE MATCHING NUMBER OF AN UNCERTAIN GRAPH
Uncertain graphs are employed to describe graph models with indeterministicinformation that produced by human beings. This paper aims to study themaximum matching problem in uncertain graphs.The number of edges of a maximum matching in a graph is called matching numberof the graph. Due to the existence of uncertain edges, the matching number of an uncertain graph is essentially an uncertain var...
متن کاملA network based approach to exact and inexact graph matching
In this paper a new approach to exact and inexact graph matching is introduced. We propose a compact network representation for graphs, which is capable of sharing identical subgraphs of one or several model graphs. The new matching algorithm NA works on the network and uses its compactness in order to speed up the detection process. Next, the problem of inexact graph matching is described and ...
متن کاملInexact graph matching by means of estimation of distribution algorithms
Estimation of distribution algorithms (EDAs) are a quite recent topic in optimization techniques. They combine two technical disciplines of soft computing methodologies: probabilistic reasoning and evolutionary computing. Several algorithms and approaches have already been proposed by di8erent authors, but up to now there are very few papers showing their potential and comparing them to other e...
متن کاملAlgorithms for Labeled Graph Matching with Applications to Systems Biology
Labeled graphs are being used extensively in computational biology to represent entities, such as biochemical networks and pathways, RNA secondary structures, and phylogentic trees. One of the major challenges is to provide techniques for inexact matching (and the resulting alignment) of such graphs in a way that optimizes some objective function. This function usually measures both the similar...
متن کامل